{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyMUPkO2WTAkNP76DCPwLqi1",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/05_Output_Parsers/05_Output_Parsers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 05 输出解析器\n",
        "\n",
        "LLM的输出为文本，但在程序中除了显示文本，可能希望获得更结构化的数据。这就是输出解析器（Output Parsers）的用武之地。"
      ],
      "metadata": {
        "id": "IkIusM-GD9MR"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Ceq3MMqkDutF",
        "outputId": "4c479341-e009-4997-cd66-f543d41df4e7"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/73.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.6/73.6 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "!pip install -q langchain==0.0.235 openai"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## List Parser\n",
        "\n",
        "List Parser将逗号分隔的文本解析为列表。"
      ],
      "metadata": {
        "id": "-0vNCtPT4zC6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.output_parsers import CommaSeparatedListOutputParser\n",
        "\n",
        "output_parser = CommaSeparatedListOutputParser()\n",
        "output_parser.parse(\"black, yellow, red, green, white, blue\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CUQ6R0V740yX",
        "outputId": "0adbbd93-a090-4770-e17b-31d58e5c4766"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['black', 'yellow', 'red', 'green', 'white', 'blue']"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Structured Output Parser\n",
        "\n",
        "当我们想要类似JSON数据结构，包含多个字段时，可以使用这个输出解析器。该解析器可以生成指令帮助LLM返回结构化数据文本，同时完成文本到结构化数据的解析工作。"
      ],
      "metadata": {
        "id": "KK4suFQr468t"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.output_parsers import StructuredOutputParser, ResponseSchema\n",
        "from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate\n",
        "from langchain.llms import OpenAI\n",
        "\n",
        "# 定义响应的结构(JSON)，两个字段 answer和source。\n",
        "response_schemas = [\n",
        "    ResponseSchema(name=\"answer\", description=\"answer to the user's question\"),\n",
        "    ResponseSchema(name=\"source\", description=\"source referred to answer the user's question, should be a website.\")\n",
        "]\n",
        "output_parser = StructuredOutputParser.from_response_schemas(response_schemas)\n",
        "\n",
        "# 获取响应格式化的指令\n",
        "format_instructions = output_parser.get_format_instructions()\n",
        "\n",
        "format_instructions"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 54
        },
        "id": "Ohc2qbl05A7s",
        "outputId": "87aa976c-f11e-4f78-a9af-698b8e96347a"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"```json\" and \"```\":\\n\\n```json\\n{\\n\\t\"answer\": string  // answer to the user\\'s question\\n\\t\"source\": string  // source referred to answer the user\\'s question, should be a website.\\n}\\n```'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# partial_variables允许在代码中预填充提示此模版的部分变量。这类似于接口，抽象类之间的关系\n",
        "prompt = PromptTemplate(\n",
        "    template=\"answer the users question as best as possible.\\n{format_instructions}\\n{question}\",\n",
        "    input_variables=[\"question\"],\n",
        "    partial_variables={\"format_instructions\": format_instructions}\n",
        ")\n",
        "\n",
        "model = OpenAI(temperature=0, openai_api_key=\"您的有效openai api key\")\n",
        "response = prompt.format_prompt(question=\"Who is the CEO of Tesla?\")\n",
        "output = model(response.to_string())\n",
        "output_parser.parse(output)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "zxYWexu25Iuc",
        "outputId": "6886b6fa-feb5-4456-a9b6-554195a7d761"
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'answer': 'Elon Musk is the CEO of Tesla.',\n",
              " 'source': 'https://www.tesla.com/about/leadership'}"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 自定义输出解析器\n",
        "\n",
        "扩展CommaSeparatedListOutputParser，让其返回的列表是经过排序的。"
      ],
      "metadata": {
        "id": "7qYA8JPv6Pnz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from typing import List\n",
        "class SortedCommaSeparatedListOutputParser(CommaSeparatedListOutputParser):\n",
        "  def parse(self, text: str) -> List[str]:\n",
        "    lst = super().parse(text)\n",
        "    return sorted(lst)\n",
        "\n",
        "output_parser = SortedCommaSeparatedListOutputParser()\n",
        "output_parser.parse(\"black, yellow, red, green, white, blue\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6M-py_2C6beo",
        "outputId": "922875ac-c93e-4ee6-fad0-74c052bd0b68"
      },
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['black', 'blue', 'green', 'red', 'white', 'yellow']"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    }
  ]
}